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FOREWORD 



This study represents a portion of the in-house research program of Project 1121, 
Technical Training Development; Task 112103, Evaluating Individual Proficiency and 
Technical Training Programs. Dr. Marty R. Rockway was the project scientist and Capt 
Wayne S. Sellman was the task scientist. This report covers research performed during 
February 1971 and April 1971. 

The author wishes to express appreciation to 1st Lt Edward M. Gardner and' Mr. 
Gerald S. Walker for their assistance in debugging and running statistical routines from 
the Laboratory’s computer library. 

This report has been reviewed and is approved. 

George K. Patterson, Colonel, USAF 
Commander 
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ABSTRACT 



This study demonstrates and discusses a new procedure for performing item 
analysis which utilizes multiple discriminant analysis to “tablnh efficiently an 
effectively an index of item validity. Application of this statistical technique to data 
derived from an attitude survey of three groups of students enrolled in tech nicdt taming 
courses yielded the following results: It disclosed those stimulus items which were 
responsive enough to discriminate among criterion groups; it partitioned thetoa 
discriminatory power of the items into two homogeneous components; *1 l ,o ca ted 
aniving at a special weighting scheme for scoring the final attitude form, and it l°ca 
the positions of the criterion groups relative to the two orthogonal dimensions 
attitude universe. 



SUMMARY 



Federico, Pat-Anthony. Identifying item validity indices utilizing a multivariate model AFH^-TR-71-16. 
Lowry. AFB, Colo.: Technical Training Division. Air Force Human Resources Laboratory , April 197 . 



Problem 

An index of item validity is typically computed to ascertain how well “ 1 JjT ‘ 
discriminates in agreement with the rest of a test, or how well an item predicts some external criterion L 
oSkern idyls techniques, it is used in the selection of the best items from which to compose a final 
test or attitude ^orm The purpose of this study was to introduce a new method of determining an index of 
£, V JSfcJS" m index which can be established efficiently and effectively by the u.tltzat.on of 

multiple discriminant analysis (DSCRIM). 



Approach 

As part of a task to identify valid and reliable psychometric measures of student attitudes towards Air 
Force technical training, DSCRIM was performed on data derived from an attitude survey of three groups 
of trainees enrolled in courses at the Technical Training Center, Lowry Air.Force Base, o ora o. 

Results and Conclusions 

It was demonstrated that DSCRIM could extract many different kinds of information from data 
which normally would have been obtained from the execution of numerous item analysis techniques. 
Namely, DSCRIM accomplished the following results: it disclosed those stimulus items which were 
responsive enough to discriminate among criterion groups; it partitioned the total discriminatory p 
the* items into two homogeneous components; it yielded data for arriving at a special weighting scheme fbr 
scoring the final form; and it located the positions of the criterion groups relative to the two orthogonal 
dimensions of the attitude universe. Not only did DSCRIM establish several distinct item validity indices 
but also it did so more efficiently (it derived these different indices simultaneously with minimum effort) 
and more effectively (it practically exhausted the information obtainable from the data concerning item 
selection) than other item analysis procedures. 

This summary was prepared by Pat-Anthony Federico, Technical Training Division, Air Force Human 
Resources Laboratory. 
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IDENTIFYING ITEM VALIDITY INDICES 



UTILIZING A MULTIVARIATE MODEL 




j. INTRODUCTION 

According to Guilford (1954), item analysis is 
typically used in the selection of the best items 
from which to compose a final test form. Starting 
with a surplus number of items, the writer can 
retain those which meet certain criteria specined 
by several statistics. Item analysis is performed to 
establish indices of item difficulty and item 
validity. As an index of item difficulty, it deter- 
mines how troublesome an item was for a 
particular population. As an index of item validity , 
it determines how well an item measures or dis- 
criminates in agreement with the rest of the test, 
or how well an item predicts some external 
criterion. 

The most common statistics computed for the 
item difficulty index are the empirical probability 
-hat the population involved will pass a particular 
item (Coombs, 1950); or the correlation of an 
item with an internal criterion (usually the total 
score) obtained from only an extreme part of the 
population (Michael, Hertzka, & Perry, 1953;. 
Other statistics that have been proposed as indices 
of item difficulty have been described by 
Gulliksen (1950). The most common procedures 
followed for establishing the item validity index 
are the division of the total sample into groups 
according to a criterion and the determination of 
whether these groups behave differently with 
respect to an item (Lawshe, 1942); the subdivision 
of the sample on the basis of total scores into 
seven groups at intervals of .6s t and the calculation 
of the proportion of each subgroup who pass the 
item (Ferguson, 1942); the correlation of an item 
with an external criterion score in some way 
(Guilford, 1954); and the implementation of 
analysis of variance techniques which are claimed 
to extract the utmost information for item 
analysis (Baker, 1939). 

The purpose of this article is to demonstrate a 
new method of conducting item analysis: an index 
of item validity that can be established more 
efficiently and effectively than those methods 
mentioned by the utilization of multiple discrim- 
inant analysis. This is a statistical routine for 
determining a linear combination pi jp variables 
which, better than any other linear combination, 
discriminates among n groups. Specifically, the 
ratio of the among-groups sum-of-squares of this 
linear combination to its within-groups sum-of- 
squares (the discriminant criterion) has a larger 



value than that for any other linear combination 
of the same variables. This optimum linear com- 
bination is called the discriminant function. 
Considering this combination as the one and only 
discriminant function, in effect, makes a linear 
ordering of the n groups. Consequently, the dis- 
criminant function thus defined does not neces- 
sarily utilize all the information in the data 
relevant to group separation, except when the 
group centroids (multivariate means) are in fact 
colinear. It is likely that when n is greater than 2, 
more than one discriminant function may be 
necessary to exhaust the information in the data 
relevant to group separation. The first linear 
combination maximizes the discriminant criterion; 
the second linear combination maximizes the ratio 
of the residual among-groups sum-of-squares after 
the effect of the first linear combination has been 
removed; the third linear combination maximizes 
the ratio of the corresponding sums-of-squares 
after the effect of the first two have been re- 
moved; and so forth. These successive linear com- 
binations are referred to as multiple discriminant 
functions; these functions are all considered to be 
mutually orthogonal. For n groups there are n - 1 of 
these, except in the unusual case when the number 
of variables, p, is smaller then n- 1; in that case 
there are p discriminant functions (Fisher, 1938; 
Bryan, 1951; Tiedeman, 1951; Rao, 1952; 
Tatsuoka & Tiedeman, 1954; Wert, Neidt, & 
i\hmann, 1954; Cooley & Lohnes, 1966). 



II. METHOD 

Subjects 

Subjects were 100 officers, 90 NCOs, and 99 
airmen who were taking courses at the Technical 
Training Center, Lowry Air Force Base, Colorado. 
Entire classes were randomly selected from five 
different technical departments; all 5s had been 
enrolled in their respective departments for one 
month or more. This period of time was assumed 
to be long enough for them to have crystallized 
some attitudes toward Air Force technical 
training. 

Attitude Form Construction 

As part of a project to identify valid and reli- 
able psychometric measures of student attitudes 
toward Air Force technical training, the contents 
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of the stimulus items used in this investigation 
were generated by random samples of students. 
The content of each of 55 items was structured 
according to a Likert format (Likert, 1932; 
Edwards, 1957); each item was randomly placed in 
its sequential position among the 55 items. In 
typical Likert fashion., the items called for 
checking one of five responses: strongly agree, 
agree, undecided, disagree, and strongly disagree. 
These five response categories were scored 5,4, 3, 
2, and 1, respectively, for items favorable to Air 
Force technical training; the scoring of the 
responses was reversed for items unfavorable to 
Air Force technical training. This procedure of 
scoring by assigning arbitrary weights to response 
categories correlates highly with normal deviate 
weighting of response categories. Also, using this 
procedure produces no noticeable differences in 
reliabilities (Likert, 1932). The total score for a 5 
was determined by summing his responses to all 
the Lems. In order to avoid any position bias in 
responding, the agreeable end of the response 
continuum was alternated randomly from the left 
to the right among the items. 

Procedure 

The attitude form was presented to the 5s in 
booklet form on the cover of which appeared 
generalized information descriptive of the task. 
The following page contained a questionnaire 
which concerned itself with some aspects of the 
5’s personal history. Next in sequence was the 
attitude form, preceded by its specific instruc- 
tions. Following this form was an open-ended 
questionnaire designed to give the 5s an oppor- 
tunity to comment on various aspects of the term 
and the experimental situation. The booklets were 
distributed to the 5s in their actual classrooms; 
typically, classes consisted of approximately 12 
students. The booklets were considered to be self- 
explanatory since they contained all the necessary 
instructions. While 5s were responding to the 
attitude form and the questionnaires, they were 
supervised by ari assistant in order to preclude 
inter-5 collaboration or response contamination. 



III. RESULTS 

Ass uirUng~that 4lic ^hrce' samples of 5s were" 

drawn from three populations in which the joint 
distributions of the scored responses to the 55 
stimulus items were multivariate normal with 
equal dispersions, a multiple discriminant analysis 
(DSC RIM) was performed on the data using 



Veldman’s (1967) computer program. The 
resulting Wilks’ lambda criterion (A) for the dis- 
criminating power of the attitude items to separate 
the three groups indicated the chance was essen- 
tially zero that group difterences as large as or 
larger than those obtained would be produced b, 
drawing three random samples from a 55 dimen- 
sional multivariate swarm. In Table 1 it can be 
seen that 100 percent of the trace was accounted 
for by two discriminant functions. The eigen- 
vectors presented in Table 2 are the coefficients of 
the discriminant functions. These discriminant 
weights, together with the correlations listed in 
Table 3, indicated the consequential contributors 
to group separation along the first and second 
functions. 

The aforementioned statistics suggested the 
following interpretations. Firstly, the original 
attitude form was sensitive enough to separate 
significantly three groups of Air Force students 
who had been known to have diastically different 
attitudes towards military training in general, 
namely, officers, NCOs, and airmen. Secondly, the 
total discriminatory power of the items was 
accounted for by two mutually orthogonal 
functions. The relative sizes of their eigenvalues 
established the extent to which the discriminant 
functions distinguished among the groups. The 
degree of importance attached to each of these 
functions in explaining the differences among the 
groups had been estimated by the percentage of 
the trace attributed to each of these functions. 
Consequently, the first function was considered to 
be more important than the second function for 
simplifying group separation. Thirdly, the relative 
contributions of the items to the discriminant 
functions had been determined by the magnitude 
of the correlations between the items and the dis- 
criminant functions. These coefficients had been 
interpreted like factor loadings so that the two 
functions were described and named in the 
context of the attitude items having significant 
correlations with the functions. Therefore, the 

Table L Significance of the Discriminant 
Functions x 2 Approximations 



Percent 

of Eigen- o 

Function Trace values df X " 



I 68.20 1-884 56 273.267 0.000 

II 31.80 .878 54 162.664 0.000 



Trace = 2.762; X= 0.185; F(110,460) = 5.552s 

p = 0.000 
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Table 2. Discriminant-Function Weights 
for Each Stimulus Item 
— a — 

Function Function 

I || Item I 



1 -0.0390 -0.0746 

2 -0.0869 -0.1260 

3 -0.0057 0.1823 

4 0.2117 0.1560 

5 -0.1240 -0.1027 

6 -0.0486 0.1134 

7 -0.0385 0.0808 

8 -0.0942 0.1516 

9 0.0448 0.1204 

10 0.2148 -0.1339 

11 -0.1225 -0.1138 

12 0.1724 0.0184 

13 0.1555 -0.0194 

14 0.0909 -0.1835 

15 0.0595 0.1691 

16 -0.0583 -0.0045 

17 0.0707 -0.0004 

18 0.0219 -0.0638 

19 -0.0305 0.1110 

20 0.3049 0.0410 

21 0.0091 -0.0364 

22 0.1322 -0.0912 

23 -0.0231 0.0427 

24 -0.0601 0.1483 

25 -0.0701 0.2030 

26 -0.0909 -0.3985 

27 -0.0444 0.0455 

28 -0.3960 0.3759 



29 


-0.1873 


-0.1129 


30 


-0.1228 


0.0191 


31 


-0.0520 


-0.0971 


32 


-0.0844 


-0.1315 


33 


0.2022 


0.0202 


34 


0.1106 


0.0895 


35 


-0.0639 


0.0008 


36 


0.0257 


-0.2038 


37 


0.2541 


0.1313 


38 


-0.2126 


-0.0905 


39 


-0.1700 


0.0383 


40 


-0.0072 


0.0361 


41 


0.0834 


0.1305 


42 


-0.0825 


-0.1056 


43 


0.1078 


0.1945 


44 


0.0516 


0.1431 


45 


-0.0454 


-0.1320 


46 


-0.0892 


-0.1683 


47 


0.1823 


-0.1022 


48 


-0.0756 


-0.0155 


49 


0.0814 


0.0375 


50 


-0.2874 


0.1890 


51 


-0.0565 


0.0729 


52 


0.0395 


-0.0972 


53 


0.0827 


0.1536 


54 


0.1846 


0.0521 


55 


0.1558 


0.1603 



Table 3. Correlations Between Stimulus 
Items and the Discriminant Functions 



Function Function 

I || item I 



1 0.2450* 0.0225 

2 0.1434 -0.0632 

3 -0.0940 0.4503* 

4 0.4747* 0.2417* 

5 -0.1787 0.1936* 

6 0.0349 0.2056* 

7 0.2577* 0.1505* 

8 0.0220 0.1109 

9 0.1277 -0.0239 

10 0.2411* 0.0317 

11 *-0.1143 0.0120 

12 0.1503* 0.1085 

13 0.3984* 0.0301 

14 0.1156 0.0212 

15 0.0548 0.2415* 

16 -0.1526* 0.1055 

17 -0.0039 0.1581* 

18 0.2226* -0.0198 

19 -0.0965 0,2110* 

20 0.6347* 0.1113 

21 0.0803 0.0578 

22 0.2112* -0.1434 

23 

24 

25 

26 

27 

28 



29 


0.0060 


0.0917 


30 


0.1208 


-0.0277 


31 


0.1468 


0.0648 


32 


-0.0114 


0.0094 


33 


0.2287* 


0.2183* 


34 


0.0847 


0.0783 


35 


0.0753 


0.0116 


36 


0.0915 


-0.0543 


37 


0.4616* 


0.0807 


38 


-0.1965* 


-0,0228 


39 


-0.0288 


0.0974 


40 


0.227S* 


0.0752 


41 


-0.0837 


0.3024* 


42 


-0.1638* 


0.0163 


43 


0.2114* 


0.1184 


44 


0.0950 


0.2584* 


45 


0.2934* 


-0.1828* 


46 


0.1506* 


-0.2132* 


47 


0.3801* 


0.0053 


48 


-0.0615 


0.3093* 


49 


0.0595 


0.3236* 


50 


-0.2046* 


0.2800* 


51 


-0.1212 


0.1240 


52 


0.3245* 


0.2385* 


53 


0.3788* 


0.5010* 


54 


0.2448* 


0.1745* 


55 


0.4434* 


0.3659* 



0.0045 0.1095 

-0.1026 0.1816* 

0.0207 0.1445 

0.0444 -0.0551 

0.0045 . 0.2112* 
-0.3708* 0.6003* 



*r ^ 0.148 is significant at the .01 level, N - 287. 
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Fig. 1. Group centroids in the discriminant 
space. 



first and second discriminant functions were 
labelled Training Management and Training 
Impressions, respectively. Lastly, another result of 
multiple discriminant analysis is that the group 
centroids computed relative to the functions are 
separated from each other to a maximum degree. 
Considering the two discriminant functions as axes 
of a Cartesian coordinate system, the group 
centroids were plotted with reference to these axes 
in Figure 1 . It can be seen that the officers 
sampled had manifested more favorable attitudes 
towards training management than the NCOs, and 
the airmen sampled had manifested more unfavor- 
able attitudes towards training managment than 
either the officers or the NCOs. Also depicted is 
the fact that the NCOs had demonstrated more 
favorable training impressions than the officers 
and the airmen. Surely, these final findings hardly 
seem surprising to soldiers, past and present. 

10 
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IV. DISCUSSION 

DSCRIM identified those items included in the 
original attitude form which were sensitive enough 
to distinguish among the preseparated groups. It 
demonstrated that the total discriminatory power 
of these items was accounted for by two 
orthogonal functions. It established the relative 
contributions of the items to each discriminant 
function. Also, it ascertained the positions of the 
groups with respect to the discriminant axes. 
These distinct determinations, derived from one 
DSCRIM, would have required the implementation 
of different item analysis procedures in con- 
junction; namely, item validity indices which 
demonstrate how well an item discriminates with 
the rest of the attitude form, correlates with some 
internal or external criterion, and weights several 
unidimensional continua. DSCRIM was capable of 
identifying these indices, plus designating the 
positions of preseparated groups relative to the 
orthogonal axes along which they differed, all in 
one computer run. 

Partitioning of the discriminatory power of the 
original set of attitude items into independent 
components reduced item space dimensionality 
without substantial loss of information. The afore- 
mentioned techniques of item analysis which are 
currently used to estimate item validity indices do 
not even establish dimensionality, let alone 
reduction of dimensionality. DSCRIM not only 
determined the orthogonal axes of the item space, 
but also it determined how well items differ- 
entiated along each axis. Knowledge of the 
relationship of an item to an orthogonal function 
which distinguished preseparated groups yielded 
information concerning the number and kind of 
discriminations the item was capable of making. 
DSCRIM emphasized the extent to which items 
separated the criterion groups along the derived 
dimensions of the attitude space. In addition to 
denoting which of the original items were 
responsive enough to differentiate among the three 
criterion groups, DSCRIM defined orthogonal 
continua which underlay the discriminations 
among the group attitudes. 

When the original attitude form was tried out, 
it included many items which had only an osten- 
sible relationship to one another. Initially, items 
were not held together by any underlying con- 
tinua, but rather by the superficial fact that all the 
items referred to statements about likes and dis- 
likes regarding the attitude universe. There might 



have been very little uniformity in terms of what 
the individual items purported to measure. Con- 
venience and comparable contents of the items 
might have made it desirable to include all of them 
in the final form. There was no reason, though, to 
think of all the items as constituting a homo- 
geneous attitude form measuring the same 
common factor in all individuals. To say the least, 
it is nice to know when constructing an attitude 
form or aptitude test, to what degree a particular 
item measures the same orthogonal dimension as 
other items. After all, a scale score is meaningful 
only to the extent that each item contributes to 
the measurement of a single homogeneous 
function. By determining the discriminant axes 
along which the criterion groups differed, 
DSCRIM also established homogeneous clusters of 
items, like factor analysis. The correlations among 
items and discriminant functions were interpreted 
in a manner similar to factor loadings. Those items 
which significantly weighted these independent 
components were retained for inclusion within the 
final form. Not only did DSCRIM disclose homo- 
geneous dimensions, but also it designated 
simultaneously those dimensions with respect to 
which items operated as discriminators to 
maximize distance among criterion groups. 

The validity of a scale score may be sub- 
stantially increased by the optimal weighting of 
each of its composite items. According to 
Thorndike (1969), the only justification for 
arriving at a special weighting scheme is in terms of 
the empirical validity of a cluster of items. The 
effectiveness of the items in discriminating with 
respect to some external criterion must be a 
paramount consideration before assigning item 
weights. DSCRIM provided information regarding 
the differential weighting of items for scoring the 
final attitude form. Not all items were found to 
have significant correlations with each of the dis- 
criminant functions which maximized the 
separation among criterion groups. The proposed 
DSCRIM procedure for weighting items has a 
distinct advantage over the multiple regression 
procedure for weighting items. An inherent 
characteristic of DSCRIM is that it maximizes 
intergroup distances by optimally weighting linear 
combinations of individual items. Therefore, 
scoring weights derived from DSCRIM are more 
susceptible to detecting attitudinal differences 
along the independent dimensions of criterion 
discrimination than are scoring weights derived 
from multiple regression analysis. 

II 
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V. SUMMARY AND CONCLUSIONS 

It can be seen that DSC RIM extracted many 
different kinds of information from the data 
which normally would have been obtained from 
the implementation of numerous item analysis 
techniques. To recapitulate, DSCRIM achieved the 
following results: It disclosed those stimulus items 
which were responsive enough to discriminate 
among criterion groups; it partitioned the total 
discriminatory power of the items into two homo- 
geneous components; it yielded data for arriving at 
a special weighting scheme for scoring the final 



form; and it located the positions of the criterion 
groups relative to the two orthogonal dimensions 
of the attitude universe. Essentially, DSCRIM 
produced useful information regarding the 
selection of the best items from which to compose 
the final attitude form. Not only did DSCRIM 
establish several distinct item validity indices, but 
also it did so more efficiently (it derived these 
different indices simultaneously with minimum 
effort) and more effectively (it practically 
exhausted the information obtainable from the 
data concerning item selection) than other item 
analysis procedures. 
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APPENDIX : TYPING AND COMPUTATIONAL PROCEDURES 
FOR THE AUTOMATED READABILITY INDEX 

For effective use of the ARI method of measuring readability, the eo ; mient must be accurate and 
the typing procedures correct. This appendix contains a set of general instructions and a practice sentence 
and paragraph for which the number of strokes, words, and sentences are known. A typist using the method 
should practice the modified typing procedure and check the apparatus before proceeding, as well as make 
periodic checks on the apparatus throughout the data tabulation and colic tion phase. The need for such 
instruction was confirmed by a feasibility test of the Automated Readability Index apparatus conducted by 
Air Training Command (ATC Project Report 69-22). 



General Instructions for Computing the Automated Readability Index 



Selecting Samples 

1. Only books and passages containing straight narrative material can be used. Skip unusual 
material such as poems, listings, math, etc. 

2. For each book, the equivalent of 10 full pages comprise the sample. Start at the beginning of a 
paragraph and end at the ending of a paragraph - this means that the material copied will not physically 
appear on one page in the text, but will be approximately one page in length. 

3. For each page typed, indicate the source and page. On the data sheet of each source, give all of 
the information needed for citing in a bibliography. Typed pages need only enough to identify source and 
page within source. 

4. Select 10 pages, spaced relatively evenly through the book. Try to avoid the first and last page 
of chapters. For example, if the book has 325 pages, type page 16, then every 32nd page. However, 
alternate between left and right hand pages (even and odd). Don’t be overly compulsive and waste undue 
time deciding exactly which page to use - in general get 10 Pages that will represent the full book, 
remembering that some books change markedly as you progress through, i.e., the latter protions are much 
more difficult than the beginning. 

Typing-Instructions 

1. Very few changes from standard typing are required. Use common sense to formulate rules. 
You must end with a count of 

a. the number of letters on the page. 

b. the number of words on the page. 

c. the number of sentences on the page. 

2. Getting the number of letters is fairly simple if you remember that such things as punctuation 
marks are included in the count. They should be counted. However, notice that the spelling of a word is 
immaterial, as long as it contains the proper number of letters. Please don t make corrections or 
strike-overs. REMEMBER YOUR TASK IS TO OBTAIN COUNTS AND ONLY THE COUNTING IS 
IMPORTANT. The tabulator will register each time you type a letter. 

3. Counting the words is done by counting the number of times the space bar is used. This requires 
a few simple changes in typing. 

a. Do not double space after a sentence • to do so would add one to the word count. 

b. Do not use the space bar to indent. 

c. IMPORTANT - You must space once at the end of each typed line in order to count the 

last word. It would be helpful to space and then put a 7 after the last word in each typed 
line, except when a period occurs. This will insure the fact that you have counted the last 
word. Simply subtract the number 7’s from the final word count. 

d. Count the numbers that enumerate items within a sentence, as words, e.g., 1 

2 3. . . 
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